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Abstract 

Recent legislative and local school accountability efforts have placed a premium on the collection, 
analysis, and use of student assessment data. Beyond state-based summative student achievement data, 
schools are searching for in-year information about student performance. As a result, school leaders are 
searching for the formative assessment system that meets their needs; they are looking for the formative 
assessment system that is just right. 



NOTE: This module has been peer-reviewed, accepted, and sanctioned by the National Council of 
Professors of Educational Administration (NCPEA) as a significant contribution to the scholarship 
and practice of education administration. In addition to publication in the Connexions Content 
Commons, this module is published in the International Journal of Educational Leadership Prepa¬ 
ration, * 1 Volume 4, Number 3 (July - September, 2009). Formatted and edited in Connexions by 
Theodore Creighton, Virginia Tech. 


1 Introduction 

The pressure to improve student performance in schools is currently battering education on two fronts. 
First, the public demands our education prepare our students to compete in a global economy. International 
assessment results reminds us of the gap between our students and their international peers (see Friedman, 
2007). The second front has been educational law. Assessment and educational accountability were directly 
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linked in 2001 with the No Child Left Behind (NCLB) legislation. This legislation ushered in a new era 
of accountability rooted in the collection, analysis, and use of student assessment data for educational 
improvement. 

As a result, school districts and their leaders must raise the stakes on educational assessments. Beyond 
using state-level data, leaders search for other assessments to use within schools during the school year to 
determine student achievement growth between state-level assessments. Two things are now clear in schools 
today: (1) the assessment “tail [is] definitely.. .wagging the curriculum/ instruction canine” (Popham, 2004, 
p. 4), and (2) it is up to school leaders to find a formative assessment system that meets the needs of their 
students, teachers, and parents. More specifically, schools are searching for an assessment that can show 
within year growth on learning objectives, diagnose within year learning needs of students, and to predict 
achievement level on the state assessment. 

The demand for formative assessment is clear. Schools are employing a variety of Formative Assessment 
Systems (FAS) ranging from “home-grown” tests created by teachers themselves to over 20 commercially 
packaged assessment systems costing $12 or more per student (Militello, Sired, & Schweid, 2008). FAS 
combine tightly knit assessment instruments, data-warehousing, analysis, and reporting batteries (Sharkley 
& Murnane, 2006). 

Formative Assessment Systems (FAS) are a fast-growing and under-studied phenomenon, with major 
implications for educator practice. Moreover, school leaders are uniquely positioned to access (e.g., purchase), 
assess (e.g., monitor and evaluate), and support and resource (e.g., train and develop) assessment data in 
schools. The purpose of this article is to provide a framework for school leaders who have to make important 
decisions about formative assessment systems. 

2 The Goldilocks Dilemma 

Fit matters. Finding the “just right” match between what something has to offer and the intended utility 
should be of the upmost importance to consumers. The story of Goldilocks provides an allegory of the search 
for the “just right” fit. Goldilocks’s metrics for fit included the temperature of porridge, size of chair, and 
comfort of bed. As educators and policymaker metrics for fit of an assessment system should include both 
the purpose of the assessments (e.g., properties of assessment including validity) and the intended uses hy 
school educators (e.g., lesson planning) (Militello, Sired, & Schweid, 2008). 

Educational assessments come in many shapes and sizes. Large-scale assessments can be either criterion- 
referenced (e.g., NCLB state-level assessments) or norm-referenced (e.g., TIMSS, NAEP, SAT, ACT). Small- 
scale assessments tend to be conducted and analyzed at the classroom level and may include valuable 
qualitative understandings of the teachers. Somewhere in the middle reside formative assessments. Wiliam 
(2006) describes, an assessment is formative only if information about what is being assessed results in 
change that would otherwise not occur. Under this guideline, “An assessment of curriculum is formative 
if it shapes the development of that curriculum. An assessment of a student is formative if it shapes that 
student’s learning” (p. 3). Here the definition of formative assessment becomes more detailed to the utility 
for teacher to understand if and how students are learning. This type of formative assessment has been 
called cognitively diagnostic. Leighton and Gierl (2007) define cognitively diagnostic assessment as a means 
to “measure specific knowledge structures and processing skills in students so as to provide information 
about their cognitive strengths and weaknesses” (p. 3). While “educators are demanding . . . they receive 
instructionally relevant results from any assessment . . . and that these assessment be sufficiently aligned 
with classroom practices to be of maximum instructional value,” (Huff & Goodman, 2007, p. 24) there exist 
little evidence that there are assessment systems available to meet these local demands. 

The issue of assessment fit is more difficult when looking at teacher as the end-users. While most 
formative assessment systems claim utility for classroom teachers, there is little evidence that the right 
assessment data is being provided them. Two frameworks help further unpack the search for an assessment 
system that meets the needs of teachers. First, the National Research Council’s “assessment triangle” offers 
anchors of assessments including the ability to: (1) diagnose student cognition within a specific subject area, 
(2) conduct student observations that elicit responses from students and offer evidence of competencies, 
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and (3) make fair and accurate inferences about student achievement (National Research Council, 2001). 
Additionally, the literature has generated normative features of the meaningful, effective use of teacher- 
level formative assessments including (see Brunner et ah, 2005; Coburn & Talbert, 2006; Heritage, 2007; 
Kerr, Marsh, Schuyler Ikemoto, Darilek, & Barney, 2006; Marshall, 2008; Militello & Sykes, 2006; Murnane, 
Sharkey, & Boudett, 2005; Streifer & Shumann, 2005; Wayman & Stringfield, 2006; Wylie & Ciofalo, 2008): 

• Assessments that are linked to a curriculum that is aligned with the district scope and sequence and 
state curricular benchmarks; 

• Assessments that provide timely, student diagnostic-level data; 

• Ability to disaggregate data with other datasets (e.g. other student achievement data, perceptional 
data, etc.) and to easily access and communicate reports with a variety of audiences, and 

• Availability of on-going professional development and immediate on-site assistance to translate data 
into instructional knowledge. 

In an effort to illustrate why this issue of fit is important for current and future educational leaders we 
describe the characteristics of three formative assessment systems currently available. Next, we analyze the 
characteristics of each system in relationship to the concept of fit. 

3 Northwest Evaluation Association (NWEA) Measures of Academic Progress 
(MAP) 

The MAP Math test consists of 52 multiple-choice items. According to the NWEA Technical Manual (2005), 
there are at least 1,500 items in the item pool for each subject area tested in a state with a minimum of 
200 items per goal area. MAP tests are supposed to be un-timed, but the manual recommends allotting 75 
minutes to ensure all students will finish without being rushed. A one-month testing window is necessary 
because each student needs access to a computer terminal. MAP utilizes computerized-adaptive testing 
technology. This assigns test items to students by matching the difficulty of an item to the achievement 
level of the student. Another key feature is that all items within each subject area pool are calibrated onto 
a common scale. This scaling allows students to be placed onto this same scale even though they respond 
to different items. The common scale also allows for analysis of student growth across time. State-specific 
item pools are created from the “universe” of all MAP items so that the pools used for a particular state are 
best matched to the state’s curriculum frameworks. 

4 ATI’s Galileo 

Galileo is a system for building benchmark assessments available through Assessment Technology Incorpo¬ 
rated (ATI). The Galileo system has a large bank of items and is designed to enable ATI to work collab- 
oratively with a district to design an assessment system that is aligned with local instruction and informs 
curriculum planning. ATI begins by working with a district to identify the number of benchmark assessments 
to be administered in a given year and the standards (objectives) to be measured at each subject area in 
each grade level. Districts are able to custom order assessments through Galileo’s “Educational Management 
System.” ATI has an online Benchmark Planner in which the district defines the assessment goals, specifies 
the standards to be measured and the number of items per standard, and reviews preliminary versions of 
the assessments. 

Galileo has several reports that summarize students’ performance at the individual and aggregate levels. 
Aggregate level reports can be produced at the class, school, and district level, and many can be created 
interactively to suit the user’s needs. A primary report for the benchmark assessments is the Developmental 
Profile Report. This report lists and describes all the standards (objectives) measured and provides an 
achievement level classification for each student for each standard. 
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5 The ASSISTment System 

5.1 

The ASSISTments system is an Internet based tool developed at Worcester Polytechnic Institute (WPI) 
and Carnegie Mellon University. The ASSISTment System blends assessment and instructional assistance. 
The system assesses student learning asking middle school level mathematic questions (drawn from a bank 
of more than 1400 questions generated from project staff, local teachers, and state-level assessment release 
items). The system assists students by providing immediate feedback and scaffolding questions if a wrong 
answer is given. 

The Screenshot below shows a sample ASSISTment item that involved understanding algebra, perimeter, 
and congruence. Like a human tutor, the ASSISTment System breaks an item down to the individual steps 
needed to solve the problem. By tracking where students’ understanding ends, the system can track the 
specific components of a student’s learning and give teachers data on the precise pieces of knowledge students 
have and don’t have mastered. For example, a student is initially presented a real state assessment item that 
has several skilled needed (congruence, perimeter and equation solving) to solve it correctly. If the student 
had gotten the question correct he would have gotten credit toward all three skills. If a student gets the 
question incorrect he would be given a question that probes his understanding of congruence. If the student 
gets that correct, the student gets credit towards understanding congruence. If he does not get the question 
correct, he can ask for help. The student is forced to eventually ask for as much help as needed so they 
can answer the question and move to the next scaffolding question focused on perimeter. The screenshot 
shows a buggy message (triggered by common student misconceptions) as well as two hint messages. After 
the student gets the second scaffolding question correct he will be given one focused on the algebra question 
solving x (Not shown in the Screenshot). 



A number of reports can be immediately accessed by teachers. An item-level report can be generated to 
see student responses on specific questions. Teachers can also generate student-level reports. In this report, 
teacher can analyze individual student progress and provide narrative communication to the student and 
their parents/guardians. Teachers can see the number of items taken, percentage correct, time interval, and 
hint requests. Finally, the ASSISTments system has a parental and teacher notification features. Parents are 
sent an email from the system that describes their child’s performance on ASSISTments. Additionally, when 
teachers assign nightly homework with the system, they are sent an email that describes who completed the 
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homework and specific items each student got right or wrong. 

6 Assessments In Action or Inaction 

The actual use of the three aforementioned FAS were studied (see Militello & Schweid, 2009; Militello, 
Sired, & Schweid, 2008). Table 1 below provides a summary of a number of features of the three FAS. In 
regard to NWEA’s MAP, this FAS is most appropriate for use at the district-level. Specifically, because 
the data generated by the assessment gives district administrators longitudinal scores, patterns from year- 
to-year can assist them in their decision-making (e.g., professional development opportunities). However, 
because the MAP data does not generate item-level data reports, teachers find little utility in its use. ATI’s 
Galileo provides a rich set of interim or benchmark assessments that school-level educators are able to use 
to monitor student gains on what was recently taught. This FAS also puts pressure on teachers to teach 
to the curriculum. Of the three systems, teachers used only one, the ASSISTment System, in a real-time, 
cognitive diagnostic manner. 


Summary of Formative Assessment System Technical Features 


Feature 

NWEA’s MAP 

ATI’s Galileo 

ASSISTments 

Score Reporting (Time 
& Access) 

Quick-24 hrs. (teacher 
access) 

Quick- 24 hrs. (teacher 
access) 

Immediate(teaeher ac¬ 
cess) 

Content Valid¬ 

ity / Alignment 

Moderate 

Excellent 

Excellent 

Diagnostic Scores 

Moderate 

Good 

Excellent 

Norm-Referenced Info 

Good 

Unavailable 

Unavailable 

Measurement Precision 

Excellent 

Unknown 

Unknown 

Equated scales 

Excellent 

Unknown 

No 


Table 1 


Teachers using the ASSISTment System create customized homework problem sets. This can be as simple 
as a teacher asking student to input answers from textbook problem into the ASSISTments System. In class 
assignments can also be administered for instant response. The ASSISTment System can be used by all 
students simultaneously and the results can be displayed anonymously. This included open response items. 
Teachers also use the system to make student work more accessible with the variety of reports available. 
These reports include reports that are automatically generated and attached to emails to both parents and 
teachers. The parent notification email provides a detailed account of ASSISTment completion and success. 
For teachers, reports are emailed regarding homework completion and success. 

Teachers also have open source coding access to develop their own items. This is an iterative process 
where questions along with scaffolding sub-questions and hint messages are constantly created and vetted. 
In summary, this system collects data efficiently and provides student-level diagnostic results as a means 
to impact a teachers teaching and students learning 2 . The data allows for classroom-level decision-making 
by teachers for day-to-day instruction (e.g., reteaching strategies) as well as long-term state-assessment 
goals (short-term regrouping). ASSISTments also develops teacher ownership of the data as they spend 
time on analyzing data not grading and creating their own local questions to be uploaded. For students, 
ASSISTments is not just another test. Rather an important benefit of the system is that students receive 
tutoring while being assessed. 

2 A number of studies have been conducted to evaluate the impact of ASSISTment use. Initial studies found that AS¬ 
SISTments in a good assessor of student knowledge (Feng & Heffernan, 2006; Feng Sz Heffernan, 2007; Razzaq, Heffernan, & 
Linderman, 2007). ASSISTments also have prognostication powers in regard to future state-level examinations (Feng, Beck, 
Heffernan, Koedinger, 2008). Perhaps most importantly, ASSISTments provides teachers with useful student-level knowledge 
they can reflect on, talk to colleagues about, and adjust their pedagogy. 
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7 Conclusion 

The Goldilocks fable helps the readers understand the search process educators must undertake to find the 
“just right” fit of a formative assessment system. The brief description of the three assessment systems and 
their use in schools spotlights that fit (and misfit) matters. The answer to the question, “Which formative 
assessment should we use?” should be: It depends. The utility of a formative assessment system is predicated 
on the end-use. That is, how educators want to use assessment data should guide their consumerism in the 
selection of a product. Figure 2 below is a model of assessment fit. In this model the specific assessments fit 
specific data generated and end-use. 


E 

<d 


NWEA’s 

MAP 


ATI’s 

Galileo 


c 

CD 

E 


WPI’s 

ASSISTments 


Classroom/ School/ 

Student Benchmark Longitudinal 



End-Use /Data Type 


Figure 1: Fit of the Three Assessment Systems 


Currently, the ubiquitous assessment data are over-hyped and under-utilized; yet, schools continue to 
be fed a steady diet of tests. The real power of assessments lies in the transformation of raw data and 
disseminated information into explicit knowledge to guide instructional improvement. The high demands to 
use data, coupled with the inadequate training and pervasive fear, result in the phenomenon of pedagogical 
practices geared toward tests and less on good instructional practice (Earl & Katz, 2002). As schools send 
out search parities to find and implement these systems, they would be well off to take with them the 
importance of fit. 

Formative assessment companies aren’t inherently bad. They have filled a niche. However, schools and 
educational policymakers need to be better consumers. Importance must be placed on the intended use of 
FAS and the characteristics of the systems must be assessed. Armed with such information will allow them 
to make more informed decisions. School leaders have a role to play in the process of formative assessment 
understanding and implementation. School leaders would be well served if they (1) understood the concept 
of assessment fit, (2) build teachers’ capacity to use assessments that provide student-level diagnostic data, 
(3) provide adequate resources and support mechanisms, and (4) monitor the use of assessment data. 

Finding the “right fit” between the purpose of an assessment system and the intended uses by local 
educators is an important issue. Asking teachers to use data to inform their teaching in order to advance 
student achievement requires careful consideration. The constant press to use “data” may result in the use 
of any data that is readily available. Such misfit leads to inappropriate uses and, at worst, it can lead 
to poor pedagogy and student confusion. Appropriate uses of formative assessment data will require local 
educators to develop efficacy toward assessments. We posit that this is a function of utility (how teachers 
can actually use the data in their practice) and outcomes (teachers can see student growth as a function of 
using the data in their practice). As more and more assessments bombard schools, we should not embrace 
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a Luddite mentality, railing against all tests. Rather we should develop our capacity to discriminate among 
assessment types to embrace, train, and use those assessments that are “just right” for our students. The 
future of formative assessment systems may be bright if they are: technology-based, curricula aligned, readily 
accessible to parents and educators, useful to students, and giving teachers information about what students 
are thinking, how they are learning, and strategies they are employing. Only when educators find the 
assessment that is “just right” will we be able to feed the practice of teachers and improve the achievement 
of students. 
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